{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "64be0343",
   "metadata": {},
   "source": [
    "# Accessing Low Level Vector APIs\n",
    "\n",
    "[txtai](https://github.com/neuml/txtai) is an all-in-one AI framework for semantic search, LLM orchestration and language model workflows.\n",
    "\n",
    "The primary interface to build vector databases with `txtai` is through [Embeddings instances](https://neuml.github.io/txtai/embeddings/). `txtai` also supports accessing all of it's features through lower level APIs. \n",
    "\n",
    "Let's dive in.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8198312",
   "metadata": {},
   "source": [
    "# Install dependencies\n",
    "\n",
    "Install `txtai` and all dependencies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3f2d0792",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "!pip install git+https://github.com/neuml/txtai#egg=txtai[ann] gguf"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "46740a21",
   "metadata": {},
   "source": [
    "# Load a dataset\n",
    "\n",
    "We'll use a [subset](https://huggingface.co/datasets/m-a-p/FineFineWeb-test) of the [FineFineWeb dataset](https://huggingface.co/datasets/m-a-p/FineFineWeb). This dataset is a domain-labeled version of the general purpose [FineWeb dataset](https://huggingface.co/datasets/HuggingFaceFW/fineweb). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3c13e554",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "\n",
    "ds = load_dataset(\"m-a-p/FineFineWeb-test\", split=\"train\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae49af96",
   "metadata": {},
   "source": [
    "# Building an Embeddings database\n",
    "\n",
    "Before going into the low-level API, let's recap how we build an Embeddings database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "2de3bf4c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.6012564897537231 The National Aeronautics and Space Administration (NASA) is the United States’ civil space program. \n"
     ]
    }
   ],
   "source": [
    "from txtai import Embeddings\n",
    "\n",
    "embeddings = Embeddings()\n",
    "embeddings.index(ds[\"text\"][:10000])\n",
    "for uid, score in embeddings.search(\"nasa\", 1):\n",
    "    print(score, ds[uid][\"text\"][:100])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6f196cb",
   "metadata": {},
   "source": [
    "This simple example abstracts the heavy lifting behind the `Embeddings` interface. Behind the scenes, it defaults to vectorizing text using [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). Vectors are stored in a [Faiss index](https://github.com/facebookresearch/faiss).\n",
    "\n",
    "The first 10K records are vectorized and stored in the vector index. Then at query time, the query is vectorized and a vector similarity search is run.\n",
    "\n",
    "While the `Embeddings` interface is convenient, it's also possible to access lower level APIs. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ecd576e8",
   "metadata": {},
   "source": [
    "# Vectors Interface\n",
    "\n",
    "First, let's vectorize our data using the low level APIs. We'll use the default Hugging Face vectorizer available in `txtai`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "03c85684",
   "metadata": {},
   "outputs": [],
   "source": [
    "from txtai.ann import ANNFactory\n",
    "from txtai.vectors import VectorsFactory\n",
    "\n",
    "vectors = VectorsFactory.create({\"path\": \"sentence-transformers/all-MiniLM-L6-v2\"})\n",
    "data = vectors.vectorize(ds[\"text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36fac81a",
   "metadata": {},
   "source": [
    "# ANN Interface\n",
    "\n",
    "Now that we have a NumPy array of vectors, let's store them in an Approximate Neighest Neighbor (ANN) backend. Recall earlier, we used the default Faiss interface. For this example, we're going to use the [PyTorch ANN](https://neuml.github.io/txtai/embeddings/configuration/ann/#torch). This will allow us to use new features that are available as of `txtai` 9.1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "34ebdc85",
   "metadata": {},
   "outputs": [],
   "source": [
    "ann = ANNFactory.create({\n",
    "    \"backend\": \"torch\",\n",
    "    \"torch\": {\n",
    "        \"safetensors\": True,\n",
    "    }\n",
    "})\n",
    "ann.index(data)\n",
    "ann.save(\"vectors.safetensors\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98a26c9a",
   "metadata": {},
   "source": [
    "This ANN builds a Torch tensor with the vectors and stores them in a [Safetensors](https://github.com/huggingface/safetensors) file.\n",
    "\n",
    "The code below shows how the file is simply a standard Safetensors file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "bcaa4832",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "data (1411868, 384)\n",
      "Memory = 2068.17 MB\n"
     ]
    }
   ],
   "source": [
    "from safetensors import safe_open\n",
    "\n",
    "def tensorinfo():\n",
    "    memory = 0\n",
    "    with safe_open(\"vectors.safetensors\", framework=\"np\") as f:\n",
    "        for key in f.keys():\n",
    "            array = f.get_tensor(key)\n",
    "            print(key, array.shape)\n",
    "            memory += array.nbytes\n",
    "\n",
    "    print(f\"Memory = {memory / 1024 / 1024:.2f} MB\")\n",
    "\n",
    "tensorinfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6525b1a",
   "metadata": {},
   "source": [
    "# Vector search\n",
    "\n",
    "Now let's show how these low-level APIs can be used to implement vector search."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "09b81958",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The answer to your question, that how many miles is it from earth to mars, is very easy to know. Because of huge satellites which are being sent to\n",
      "mars in search of life from many countries, we have discovered a lot about mars. According to experts, earth and mars reaches to their closest points\n",
      "in every 26 months. This situation is considered as opposition of mars as the location of sun and mars in totally opposite to each other in relation\n",
      "to earth. When this opposition takes place, the planet is visible with a red tint in the sky from earth. And this also gives mars a name, i.e. the red\n",
      "planet. Mars is also the fourth planet from sun, which is located between Jupiter and earth. Its distance from sun is not only opposite but is also\n",
      "much further away, than that of the earth and sun. The distance between the sun and mars is said to be 140 million miles. Mars can reach about 128\n",
      "million miles closer to the sun whereas it can even travel around 154 million miles away from it. The assumed distance between mars and earth is said\n",
      "to be between 40 to 225 million miles. The distance between these two planets keeps on changing throughout the year because of the elliptical path in\n",
      "which all the planets rotate. As the distance between mars, sun and earth is so much high, it takes a Martian year, for mars to go around the sun. The\n",
      "Martian period includes a time of around 687 earth days. This means that, it takes more than 2 years for the mars to reach its initial rotation point.\n",
      "If we talk about one Martian day, it is the total time which is taken by a planet to spin around once. This day usually lasts longer than our regular\n",
      "earth days. So this was the actual reason which states the distance between earth and mars. \n",
      "\n",
      " 0.7060051560401917\n"
     ]
    }
   ],
   "source": [
    "import textwrap\n",
    "\n",
    "def search(text):\n",
    "    result = ann.search(vectors.vectorize([text]), 1)\n",
    "    index, score = result[0][0]\n",
    "    print(textwrap.fill(ds[index][\"text\"], width=150), \"\\n\\n\", score)\n",
    "\n",
    "search(\"How far is earth from mars?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8adf64db",
   "metadata": {},
   "source": [
    "# Torch 4-bit quantization\n",
    "\n",
    "`txtai` 9.1 adds a new feature: 4-bit vector quantization. This means that instead of using 32-bit floats for each vector dimension, this method uses 4 bits. This reduces memory usage to ~12-13% of the original size."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "cbe5af13",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "absmax (8471208,)\n",
      "code (16,)\n",
      "data (271078656, 1)\n",
      "shape (2,)\n",
      "Memory = 290.84 MB\n"
     ]
    }
   ],
   "source": [
    "ann = ANNFactory.create({\n",
    "    \"backend\": \"torch\",\n",
    "    \"torch\": {\n",
    "        \"safetensors\": True,\n",
    "        \"quantize\": {\n",
    "            \"type\": \"nf4\"\n",
    "        }\n",
    "    }\n",
    "})\n",
    "ann.index(data)\n",
    "ann.save(\"vectors.safetensors\")\n",
    "\n",
    "tensorinfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca7b1e0e",
   "metadata": {},
   "source": [
    "Note how the unquantized vectors took 2068.17 MB and this only takes 290.84 MB! With quantization and ever growing GPUs, this opens the possibility of pinning your entire vector database in GPU memory!\n",
    "\n",
    "For example, let's extrapolate this dataset to 100M rows.\n",
    "\n",
    "```\n",
    "(290.84 MB / 1,411,868) * 100,000,000 = 20,599.7 MB\n",
    "```\n",
    "\n",
    "An entire 100M row dataset could fit into a single RTX 3090 / 4090 consumer GPU!\n",
    "\n",
    "Let's confirm search still works the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "a02578dd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The answer to your question, that how many miles is it from earth to mars, is very easy to know. Because of huge satellites which are being sent to\n",
      "mars in search of life from many countries, we have discovered a lot about mars. According to experts, earth and mars reaches to their closest points\n",
      "in every 26 months. This situation is considered as opposition of mars as the location of sun and mars in totally opposite to each other in relation\n",
      "to earth. When this opposition takes place, the planet is visible with a red tint in the sky from earth. And this also gives mars a name, i.e. the red\n",
      "planet. Mars is also the fourth planet from sun, which is located between Jupiter and earth. Its distance from sun is not only opposite but is also\n",
      "much further away, than that of the earth and sun. The distance between the sun and mars is said to be 140 million miles. Mars can reach about 128\n",
      "million miles closer to the sun whereas it can even travel around 154 million miles away from it. The assumed distance between mars and earth is said\n",
      "to be between 40 to 225 million miles. The distance between these two planets keeps on changing throughout the year because of the elliptical path in\n",
      "which all the planets rotate. As the distance between mars, sun and earth is so much high, it takes a Martian year, for mars to go around the sun. The\n",
      "Martian period includes a time of around 687 earth days. This means that, it takes more than 2 years for the mars to reach its initial rotation point.\n",
      "If we talk about one Martian day, it is the total time which is taken by a planet to spin around once. This day usually lasts longer than our regular\n",
      "earth days. So this was the actual reason which states the distance between earth and mars. \n",
      "\n",
      " 0.6982609033584595\n"
     ]
    }
   ],
   "source": [
    "search(\"How far is earth from mars?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0e62670",
   "metadata": {},
   "source": [
    "Same result. Note the score is slightly different but this is expected."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89abb301",
   "metadata": {},
   "source": [
    "# GGUF Support\n",
    "\n",
    "`txtai` 9.1 also adds support for [GGML](https://github.com/ggml-org/ggml) / [GGUF](https://huggingface.co/docs/hub/en/gguf) popularized by [llama.cpp](https://github.com/ggml-org/llama.cpp)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "e04498fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "ann = ANNFactory.create({\n",
    "    \"backend\": \"ggml\",\n",
    "    \"ggml\": {\n",
    "        \"quantize\": \"Q4_0\"\n",
    "    }\n",
    "})\n",
    "ann.index(data)\n",
    "ann.save(\"vectors.gguf\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bea0f58",
   "metadata": {},
   "source": [
    "Now let's check out the generated file using the [gguf](https://github.com/ggml-org/llama.cpp/tree/master/gguf-py) package provided by llama.cpp."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "b0e6f85e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tensor Name                    | Shape           | Size         | Quantization\n",
      "--------------------------------------------------------------------------------\n",
      "data                           | 384x1411868     | 258.52 MB    | Q4_0\n"
     ]
    }
   ],
   "source": [
    "from gguf.gguf_reader import GGUFReader\n",
    "\n",
    "reader = GGUFReader(\"vectors.gguf\")\n",
    "\n",
    "# List all tensors\n",
    "info = \"{:<30} | {:<15} | {:<12} | {}\"\n",
    "print(info.format(\"Tensor Name\", \"Shape\", \"Size\", \"Quantization\"))\n",
    "print(\"-\" * 80)\n",
    "for tensor in reader.tensors:\n",
    "    shape = \"x\".join(map(str, tensor.shape))\n",
    "    size = f\"{tensor.n_elements / 2 / 1024 / 1024:.2f} MB\"\n",
    "    quantization = tensor.tensor_type.name\n",
    "    print(info.format(tensor.name, shape, size, quantization))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7be79bf7",
   "metadata": {},
   "source": [
    "And search like we did with Torch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "450dc024",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The answer to your question, that how many miles is it from earth to mars, is very easy to know. Because of huge satellites which are being sent to\n",
      "mars in search of life from many countries, we have discovered a lot about mars. According to experts, earth and mars reaches to their closest points\n",
      "in every 26 months. This situation is considered as opposition of mars as the location of sun and mars in totally opposite to each other in relation\n",
      "to earth. When this opposition takes place, the planet is visible with a red tint in the sky from earth. And this also gives mars a name, i.e. the red\n",
      "planet. Mars is also the fourth planet from sun, which is located between Jupiter and earth. Its distance from sun is not only opposite but is also\n",
      "much further away, than that of the earth and sun. The distance between the sun and mars is said to be 140 million miles. Mars can reach about 128\n",
      "million miles closer to the sun whereas it can even travel around 154 million miles away from it. The assumed distance between mars and earth is said\n",
      "to be between 40 to 225 million miles. The distance between these two planets keeps on changing throughout the year because of the elliptical path in\n",
      "which all the planets rotate. As the distance between mars, sun and earth is so much high, it takes a Martian year, for mars to go around the sun. The\n",
      "Martian period includes a time of around 687 earth days. This means that, it takes more than 2 years for the mars to reach its initial rotation point.\n",
      "If we talk about one Martian day, it is the total time which is taken by a planet to spin around once. This day usually lasts longer than our regular\n",
      "earth days. So this was the actual reason which states the distance between earth and mars. \n",
      "\n",
      " 0.7043964862823486\n"
     ]
    }
   ],
   "source": [
    "search(\"How far is earth from mars?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af1a344c",
   "metadata": {},
   "source": [
    "# Wrapping up\n",
    "\n",
    "While the `Embeddings` interface is the preferred way to build vector databases with `txtai`, it's entirely possible to also build with the low level APIs!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "local",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
